603 research outputs found

    Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

    Full text link
    Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ~4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only improves the accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in absolute accuracy, but also holds potential benefits for non-idiomatic sentences.Comment: EMNLP 202

    Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity

    Full text link
    Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larger syntactic forms -- i.e. phenomena at the intersection of syntax and semantics. We present the semantic notion of agentivity as a case study for probing such interactions. We created a novel evaluation dataset by utilitizing the unique linguistic properties of a subset of optionally transitive English verbs. This dataset was used to prompt varying sizes of three model classes to see if they are sensitive to agentivity at the lexical level, and if they can appropriately employ these word-level priors given a specific syntactic context. Overall, GPT-3 text-davinci-003 performs extremely well across all experiments, outperforming all other models tested by far. In fact, the results are even better correlated with human judgements than both syntactic and semantic corpus statistics. This suggests that LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery than select corpora for certain tasks

    EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

    Get PDF
    We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881

    Microstructure and texture evolutions in FeCrAl cladding tube during pilger processing

    Get PDF
    The microstructure of FeCrAl cladding tubes depends on the fabricating process history. In this study, the microstructural characteristics of wrought FeCrAl alloys during industrial pilger processing into thin-walled tubes were investigated. The hot extruded tube showed ∼100 μm equiaxed grains with weak α∗-fiber in {h11}<1/h12> texture, while pilger rolling process change the microstructure to fragmented and elongated grains along the rolling direction. The pilgered textures could be predicted with the VPSC model. The inter-pass annealing at 800–850 \ub0C for 1 h results in recovery and recrystallization of the ferric matrix and restoration of ductility. The final finished tube shows fine recrystallized grains (∼11 μm) with dominant γ-fiber in three dimensions. Pilger rolling enhanced α-fiber while annealing reduced α-fiber and enhanced γ-fiber. Microstructural evolution in the Laves precipitates followed the sequence of faceted needle-like → spherical → faceted ellipsoidal. Thermomechanical processing resulted in cladding tubes with an area fraction of ∼5% and a number density of 5 7 10−11 m−2 in Laves precipitates, which is half that of the first-pilgered tube. Laves precipitates pin the grain boundaries to control the microstructure and prevent grain coarsening

    Witness: The Modern Writer as Witness

    Full text link
    Editor\u27s Note [Excerpt] Magic can mean many different things, especially for writers. Magic can be an illusion, a sleight of hand designed to trick onlookers into believing the impossible. Or magic can be a supernatural force in a world of harsh reality, a set of beliefs that sits just outside the realms of organized religion and advanced technology. Wizards and demons, Las Vegas entertainers and houngans --they all practice a kind of sorcery. For poets and prose writers, though, magic affords an opportunity for us to stretch the limitations of the physical world in search of new themes, settings, and characters. Magic is a door we eagerly walk through to reach new lands. We at Witness have thoroughly enjoyed the process of selecting the themed works we have collected here, mainly because the idea of enchantment is inspiring. There is the possibility of positive charms; there is a chance for dark witchery. And sometimes the spell cast by a character is nebulous, difficult to categorize. It’s arguable that we cherish these incantations the most, since they leave us in a state of wonderment bordering on disorientation. Yes, magic can also leave us bewildered and thankful for the bewilderment.https://digitalscholarship.unlv.edu/witness/1001/thumbnail.jp

    Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

    Full text link
    Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation. First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models. We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for human intervention.Comment: Work in Progres

    Two-station measurement of Rayleigh-wave phase velocities for the Huatung basin, the westernmost Philippine Sea, with OBS : implications for regional tectonics

    Get PDF
    Author Posting. © The Authors, 2009. This article is posted here by permission of John Wiley & Sons for personal use, not for redistribution. The definitive version was published in Geophysical Journal International 179 (2009): 1859-1869, doi:10.1111/j.1365-246X.2009.04391.x.A broad-band ocean-bottom seismometer (OBS) deployed ~180 km east of Taiwan provides a first glimpse into the upper mantle beneath the westernmost section of the Philippine Sea or the Huatung basin (HB). We measured interstation phase velocities of Rayleigh waves between the OBS and stations on the eastern coast of Taiwan. The phase velocities show smooth variations from 3.8 to 3.9 km s−1 for periods of 25–40 s. In this short period range, phase velocities are comparable to those characterizing the 15–30 Ma Parece-Vela basin of the Philippine Sea. Modelling of the finite-frequency effect proves the validity of the measurement for the average HB. The shear-wave velocity models inverted from the 25 to 40 s dispersion show a velocity at lithospheric depths about 0.1 km s−1 lower than that of the west Philippine Sea, which agrees with the age effect derived from the Pacific pure-path model. Inversions incorporating the less reliable data above 40 s yield a shear velocity <4.0 km s−1 below 150 km, an unrealistic value even for a hotspot plume environment. The seismological evidence, together with the correlation in seafloor depth, suggests that the HB and the Parece-Vela basin may have a similar age. This is at odds with the previous geochronological study suggesting an early-Cretaceous age for the HB. Thermal rejuvenation of the lithosphere was examined as a potential solution to reconciling the two age models.The research is supported by the National Science Council, Taiwan, Republic of China, under grant NSC 96–2745-M-001–005

    Antimicrobial resistance among migrants in Europe: a systematic review and meta-analysis

    Get PDF
    BACKGROUND: Rates of antimicrobial resistance (AMR) are rising globally and there is concern that increased migration is contributing to the burden of antibiotic resistance in Europe. However, the effect of migration on the burden of AMR in Europe has not yet been comprehensively examined. Therefore, we did a systematic review and meta-analysis to identify and synthesise data for AMR carriage or infection in migrants to Europe to examine differences in patterns of AMR across migrant groups and in different settings. METHODS: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PubMed, and Scopus with no language restrictions from Jan 1, 2000, to Jan 18, 2017, for primary data from observational studies reporting antibacterial resistance in common bacterial pathogens among migrants to 21 European Union-15 and European Economic Area countries. To be eligible for inclusion, studies had to report data on carriage or infection with laboratory-confirmed antibiotic-resistant organisms in migrant populations. We extracted data from eligible studies and assessed quality using piloted, standardised forms. We did not examine drug resistance in tuberculosis and excluded articles solely reporting on this parameter. We also excluded articles in which migrant status was determined by ethnicity, country of birth of participants' parents, or was not defined, and articles in which data were not disaggregated by migrant status. Outcomes were carriage of or infection with antibiotic-resistant organisms. We used random-effects models to calculate the pooled prevalence of each outcome. The study protocol is registered with PROSPERO, number CRD42016043681. FINDINGS: We identified 2274 articles, of which 23 observational studies reporting on antibiotic resistance in 2319 migrants were included. The pooled prevalence of any AMR carriage or AMR infection in migrants was 25·4% (95% CI 19·1-31·8; I2 =98%), including meticillin-resistant Staphylococcus aureus (7·8%, 4·8-10·7; I2 =92%) and antibiotic-resistant Gram-negative bacteria (27·2%, 17·6-36·8; I2 =94%). The pooled prevalence of any AMR carriage or infection was higher in refugees and asylum seekers (33·0%, 18·3-47·6; I2 =98%) than in other migrant groups (6·6%, 1·8-11·3; I2 =92%). The pooled prevalence of antibiotic-resistant organisms was slightly higher in high-migrant community settings (33·1%, 11·1-55·1; I2 =96%) than in migrants in hospitals (24·3%, 16·1-32·6; I2 =98%). We did not find evidence of high rates of transmission of AMR from migrant to host populations. INTERPRETATION: Migrants are exposed to conditions favouring the emergence of drug resistance during transit and in host countries in Europe. Increased antibiotic resistance among refugees and asylum seekers and in high-migrant community settings (such as refugee camps and detention facilities) highlights the need for improved living conditions, access to health care, and initiatives to facilitate detection of and appropriate high-quality treatment for antibiotic-resistant infections during transit and in host countries. Protocols for the prevention and control of infection and for antibiotic surveillance need to be integrated in all aspects of health care, which should be accessible for all migrant groups, and should target determinants of AMR before, during, and after migration. FUNDING: UK National Institute for Health Research Imperial Biomedical Research Centre, Imperial College Healthcare Charity, the Wellcome Trust, and UK National Institute for Health Research Health Protection Research Unit in Healthcare-associated Infections and Antimictobial Resistance at Imperial College London

    Surgical site infection after gastrointestinal surgery in high-income, middle-income, and low-income countries: a prospective, international, multicentre cohort study

    Get PDF
    Background: Surgical site infection (SSI) is one of the most common infections associated with health care, but its importance as a global health priority is not fully understood. We quantified the burden of SSI after gastrointestinal surgery in countries in all parts of the world. Methods: This international, prospective, multicentre cohort study included consecutive patients undergoing elective or emergency gastrointestinal resection within 2-week time periods at any health-care facility in any country. Countries with participating centres were stratified into high-income, middle-income, and low-income groups according to the UN's Human Development Index (HDI). Data variables from the GlobalSurg 1 study and other studies that have been found to affect the likelihood of SSI were entered into risk adjustment models. The primary outcome measure was the 30-day SSI incidence (defined by US Centers for Disease Control and Prevention criteria for superficial and deep incisional SSI). Relationships with explanatory variables were examined using Bayesian multilevel logistic regression models. This trial is registered with ClinicalTrials.gov, number NCT02662231. Findings: Between Jan 4, 2016, and July 31, 2016, 13 265 records were submitted for analysis. 12 539 patients from 343 hospitals in 66 countries were included. 7339 (58·5%) patient were from high-HDI countries (193 hospitals in 30 countries), 3918 (31·2%) patients were from middle-HDI countries (82 hospitals in 18 countries), and 1282 (10·2%) patients were from low-HDI countries (68 hospitals in 18 countries). In total, 1538 (12·3%) patients had SSI within 30 days of surgery. The incidence of SSI varied between countries with high (691 [9·4%] of 7339 patients), middle (549 [14·0%] of 3918 patients), and low (298 [23·2%] of 1282) HDI (p < 0·001). The highest SSI incidence in each HDI group was after dirty surgery (102 [17·8%] of 574 patients in high-HDI countries; 74 [31·4%] of 236 patients in middle-HDI countries; 72 [39·8%] of 181 patients in low-HDI countries). Following risk factor adjustment, patients in low-HDI countries were at greatest risk of SSI (adjusted odds ratio 1·60, 95% credible interval 1·05–2·37; p=0·030). 132 (21·6%) of 610 patients with an SSI and a microbiology culture result had an infection that was resistant to the prophylactic antibiotic used. Resistant infections were detected in 49 (16·6%) of 295 patients in high-HDI countries, in 37 (19·8%) of 187 patients in middle-HDI countries, and in 46 (35·9%) of 128 patients in low-HDI countries (p < 0·001). Interpretation: Countries with a low HDI carry a disproportionately greater burden of SSI than countries with a middle or high HDI and might have higher rates of antibiotic resistance. In view of WHO recommendations on SSI prevention that highlight the absence of high-quality interventional research, urgent, pragmatic, randomised trials based in LMICs are needed to assess measures aiming to reduce this preventable complication
    corecore